Skip to content

refactor: dedup regex/validation/provider code (file-search-on); fix JSON in predicate#143

Merged
richardwooding merged 2 commits into
mainfrom
refactor/file-search-on-dedup
Jun 14, 2026
Merged

refactor: dedup regex/validation/provider code (file-search-on); fix JSON in predicate#143
richardwooding merged 2 commits into
mainfrom
refactor/file-search-on-dedup

Conversation

@richardwooding

Copy link
Copy Markdown
Contributor

Summary

Two related changes, in two commits for easy review:

  1. fix: per-dialect JSON array in membership predicate — each dialect now
    emits the full boolean predicate for elem in jsonArray (element + array)
    instead of relying on the caller to prepend elem = . Fixes semantically
    wrong SQL on MySQL (JSON_OVERLAPS), SQLite/DuckDB (EXISTS … json_each),
    and BigQuery (IN UNNEST(JSON_VALUE_ARRAY(...))); PostgreSQL unchanged.
    Ported from cel2sql4j@1835215. Already noted under [Unreleased] in CHANGELOG.

  2. refactor: deduplicate code surfaced by file-search-on — used the
    file-search-on MCP server (find_near_duplicates, dead_code, code_graph)
    to find duplication/dead code, then consolidated it.

Refactor detail

Area Before After
Regex validation 5 near-identical dialect/*/regex.go, each reimplementing the RE2 ReDoS checks with drift (only some compiled the pattern; nested-quantifier / nesting-depth loops differed) One canonical dialect/internal/regexsafe.Validate; each dialect keeps only its char-class transform
Field-name validation 6 copies of the validateFieldName skeleton Shared dialect/internal/identsafe.ValidateFieldName; dialects keep their own keyword sets / length limits
Type providers 5 byte-identical types.Provider boilerplates internal/celprovider.Base with a TypeMapper hook, embedded by mysql/sqlite/duckdb/bigquery/spark (pg keeps its richer nested/composite implementation)
Dead code unused top-level escapeJSONFieldName in utils.go removed, with orphaned tests

The regex consolidation also closes a security-consistency gap: the ReDoS
validation is now identical across all dialects rather than drifting per copy.

Net ~1,040 fewer lines; +312 lines of new single-source-of-truth code.

Verification

  • go build ./..., go vet ./...: clean
  • golangci-lint run ./...: 0 issues
  • gofmt: clean
  • All non-Docker tests pass (bigquery, duckdb, spark, sqlite, regexsafe, testutil green; regex/ReDoS/validation suites green). The only failures locally are testcontainer integration tests that need Docker — environmental, not from these changes (zero assertion failures). CI with Docker will exercise those.
  • Re-ran find_near_duplicates: the regex.go and provider.go clusters are gone.

🤖 Generated with Claude Code

Richard Wooding and others added 2 commits June 14, 2026 17:33
Each dialect now owns the full boolean predicate for `elem in jsonArray`,
emitting both the element and the array expression instead of relying on the
caller to prepend `elem = `. Fixes semantically wrong SQL on MySQL, SQLite,
DuckDB, and BigQuery; PostgreSQL semantics unchanged.

Ported from cel2sql4j (SPANDigital/cel2sql4j@1835215).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Used the file-search-on MCP server (find_near_duplicates, dead_code,
code_graph) to locate duplication and dead code, then consolidated:

- regex: extract the shared RE2 ReDoS/unsupported-feature validation from the
  five dialect regex.go files into dialect/internal/regexsafe.Validate,
  eliminating subtle per-dialect drift (e.g. only some compiled the pattern).
  Each dialect keeps only its character-class transform.
- validation: extract the shared validateFieldName skeleton into
  dialect/internal/identsafe.ValidateFieldName; dialects keep their own keyword
  sets and length limits.
- providers: add internal/celprovider.Base implementing the shared
  types.Provider boilerplate over a schema map with a TypeMapper hook; the
  flat providers (mysql, sqlite, duckdb, bigquery, spark) embed it. pg keeps
  its own implementation (nested/composite schema resolution + pool ownership).
- dead code: remove the unused top-level escapeJSONFieldName from utils.go
  (superseded by per-dialect copies) and its orphaned tests.

Net ~1,040 fewer lines; behavior preserved (all non-Docker tests pass,
lint clean).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@richardwooding richardwooding merged commit 2e0fc0e into main Jun 14, 2026
9 checks passed
@richardwooding richardwooding deleted the refactor/file-search-on-dedup branch June 14, 2026 15:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant